Fix some query extraction bugs.#29283
Merged
jpountz merged 1 commit intoelastic:masterfrom Apr 3, 2018
Merged
Conversation
While playing with the percolator I found two bugs: - Sometimes we set a min_should_match that is greater than the number of extractions. While this doesn't cause direct trouble, it does when the query is nested into a boolean query and the boolean query tries to compute the min_should_match for the entire query based on its own min_should_match and those of the sub queries. So I changed the code to throw an exception when min_should_match is greater than the number of extractions. - Boolean queries claim matches are verified when in fact they shouldn't. This is due to the fact that boolean queries assume that they are verified if all sub clauses are verified but things are more complex than that, eg. conjunctions that are nested in a disjunction or disjunctions that are nested in a conjunction can generally not be verified without running the query.
Collaborator
|
Pinging @elastic/es-search-aggs |
martijnvg
approved these changes
Mar 29, 2018
Member
martijnvg
left a comment
There was a problem hiding this comment.
LGTM - Thanks for catching and fixing these bugs!
| numOptionalClauses++; | ||
| } | ||
| } | ||
| if (minimumShouldMatch > numOptionalClauses) { |
Member
There was a problem hiding this comment.
👍 because we know query will never match
| Result subResult = analyze(clause.getQuery(), version); | ||
| if (subResult.matchAllDocs == false && subResult.extractions.isEmpty()) { | ||
| // doesn't match anything | ||
| return subResult; |
| } | ||
| } | ||
| builder.setMinimumNumberShouldMatch(numShouldClauses); | ||
| builder.setMinimumNumberShouldMatch(randomIntBetween(0, numShouldClauses)); |
Member
There was a problem hiding this comment.
Did these changes to the randomized test catch the two bugs?
Contributor
Author
There was a problem hiding this comment.
I tried to make them more likely to catch bugs but unfortunately they did not.
jasontedor
added a commit
to jasontedor/elasticsearch
that referenced
this pull request
Apr 3, 2018
* master: Reindex: Fix error in delete-by-query rest spec (elastic#29318) Improve similarity integration. (elastic#29187) Fix some query extraction bugs. (elastic#29283) [Docs] Correct experimental note formatting Move Nullable into core (elastic#29341) [Docs] Update getting-started.asciidoc (elastic#29294) Elasticsearch 6.3.0 is now on Lucene 7.3. [DOCS] Refer back to index API for full-document updates in _update API section (elastic#28677) Fix missing comma in ingest-node.asciidoc (elastic#29343) Improve exception handling on TransportMasterNodeAction (elastic#29314) Don't break allocation if resize source index is missing (elastic#29311) Use fixture to test repository-s3 plugin (elastic#29296) Fix NDCG for empty search results (elastic#29267) Pass through script params in scripted metric agg (elastic#29154) Fix Eclipse build. Upgrade to lucene-7.3.0-snapshot-98a6b3d. (elastic#29298) Painless: Remove extraneous INLINE constant. (elastic#29340)
jpountz
added a commit
to jpountz/elasticsearch
that referenced
this pull request
Apr 3, 2018
While playing with the percolator I found two bugs: - Sometimes we set a min_should_match that is greater than the number of extractions. While this doesn't cause direct trouble, it does when the query is nested into a boolean query and the boolean query tries to compute the min_should_match for the entire query based on its own min_should_match and those of the sub queries. So I changed the code to throw an exception when min_should_match is greater than the number of extractions. - Boolean queries claim matches are verified when in fact they shouldn't. This is due to the fact that boolean queries assume that they are verified if all sub clauses are verified but things are more complex than that, eg. conjunctions that are nested in a disjunction or disjunctions that are nested in a conjunction can generally not be verified without running the query.
jasontedor
added a commit
to jasontedor/elasticsearch
that referenced
this pull request
Apr 3, 2018
* master: Build: Fix Java9 MR build (elastic#29312) Reindex: Fix error in delete-by-query rest spec (elastic#29318) Improve similarity integration. (elastic#29187) Fix some query extraction bugs. (elastic#29283) [Docs] Correct experimental note formatting Move Nullable into core (elastic#29341) [Docs] Update getting-started.asciidoc (elastic#29294) Elasticsearch 6.3.0 is now on Lucene 7.3. [DOCS] Refer back to index API for full-document updates in _update API section (elastic#28677) Fix missing comma in ingest-node.asciidoc (elastic#29343) Improve exception handling on TransportMasterNodeAction (elastic#29314) Don't break allocation if resize source index is missing (elastic#29311) Use fixture to test repository-s3 plugin (elastic#29296) Fix NDCG for empty search results (elastic#29267) Pass through script params in scripted metric agg (elastic#29154) Fix Eclipse build. Upgrade to lucene-7.3.0-snapshot-98a6b3d. (elastic#29298) Painless: Remove extraneous INLINE constant. (elastic#29340) Remove HTTP max content length leniency (elastic#29337) Begin moving XContent to a separate lib/artifact (elastic#29300)
jpountz
added a commit
that referenced
this pull request
Apr 4, 2018
While playing with the percolator I found two bugs: - Sometimes we set a min_should_match that is greater than the number of extractions. While this doesn't cause direct trouble, it does when the query is nested into a boolean query and the boolean query tries to compute the min_should_match for the entire query based on its own min_should_match and those of the sub queries. So I changed the code to throw an exception when min_should_match is greater than the number of extractions. - Boolean queries claim matches are verified when in fact they shouldn't. This is due to the fact that boolean queries assume that they are verified if all sub clauses are verified but things are more complex than that, eg. conjunctions that are nested in a disjunction or disjunctions that are nested in a conjunction can generally not be verified without running the query.
jasontedor
added a commit
to jasontedor/elasticsearch
that referenced
this pull request
Apr 4, 2018
* 6.x: Improve similarity integration. (elastic#29187) Fix some query extraction bugs. (elastic#29283) Fixed quote_field_suffix in query_string (elastic#29332) TEST: Update negative byte size setting error msg Fix bwc in GeoDistanceQuery serialization (elastic#29325)
martijnvg
added a commit
that referenced
this pull request
Apr 5, 2018
* es/master: (68 commits) Allow using distance measure in the geo context precision (#29273) Disable failing query in QueryBuilderBWCIT. Fixed quote_field_suffix in query_string (#29332) Use fixture to test repository-url module (#29355) Remove undocumented action.master.force_local setting (#29351) Enhance error for out of bounds byte size settings (#29338) Fix QueryAnalyzerTests. Fix HasChildQueryBuilderTests to not use the `classic` similarity. [Docs] Correct javadoc of GetIndexRequest (#29364) Make TransportRankEvalAction members final Add awaits fix for a query analyzer test Check presence of multi-types before validating new mapping (#29316) Add awaits fix for HasChildQueryBuilderTests Remove silent batch mode from install plugin (#29359) Align cat thread pool info to thread pool config (#29195) Track Lucene operations in engine explicitly (#29357) Build: Fix Java9 MR build (#29312) Reindex: Fix error in delete-by-query rest spec (#29318) Improve similarity integration. (#29187) Fix some query extraction bugs. (#29283) ...
martijnvg
added a commit
that referenced
this pull request
Apr 5, 2018
* es/6.x: (68 commits) Add note to migration docs on silent batch mode (#29365) Allow using distance measure in the geo context precision (#29273) Disable failing query in QueryBuilderBWCIT. Improve similarity integration. (#29187) Fix some query extraction bugs. (#29283) Fixed quote_field_suffix in query_string (#29332) TEST: Update negative byte size setting error msg Fix bwc in GeoDistanceQuery serialization (#29325) Move testMappingConflictRootCause to different class Enhance error for out of bounds byte size settings (#29338) [Docs] Correct javadoc of GetIndexRequest (#29364) Check presence of multi-types before validating new mapping (#29316) Make TransportRankEvalAction members final Pass through script params in scripted metric agg (#29154) Remove silent batch mode from install plugin (#29359) Track Lucene operations in engine explicitly (#29357) Build: Fix Java9 MR build (#29312) Reindex: Fix error in delete-by-query rest spec (#29318) Move Nullable into core (#29341) [Docs] Correct experimental note formatting ...
jasontedor
added a commit
to jasontedor/elasticsearch
that referenced
this pull request
Apr 5, 2018
* master: (110 commits) Remove undocumented action.master.force_local setting (elastic#29351) Enhance error for out of bounds byte size settings (elastic#29338) Fix QueryAnalyzerTests. Fix HasChildQueryBuilderTests to not use the `classic` similarity. [Docs] Correct javadoc of GetIndexRequest (elastic#29364) Make TransportRankEvalAction members final Add awaits fix for a query analyzer test Check presence of multi-types before validating new mapping (elastic#29316) Add awaits fix for HasChildQueryBuilderTests Remove silent batch mode from install plugin (elastic#29359) Align cat thread pool info to thread pool config (elastic#29195) Track Lucene operations in engine explicitly (elastic#29357) Build: Fix Java9 MR build (elastic#29312) Reindex: Fix error in delete-by-query rest spec (elastic#29318) Improve similarity integration. (elastic#29187) Fix some query extraction bugs. (elastic#29283) [Docs] Correct experimental note formatting Move Nullable into core (elastic#29341) [Docs] Update getting-started.asciidoc (elastic#29294) Elasticsearch 6.3.0 is now on Lucene 7.3. ...
jpountz
added a commit
that referenced
this pull request
Apr 6, 2018
While playing with the percolator I found two bugs: - Sometimes we set a min_should_match that is greater than the number of extractions. While this doesn't cause direct trouble, it does when the query is nested into a boolean query and the boolean query tries to compute the min_should_match for the entire query based on its own min_should_match and those of the sub queries. So I changed the code to throw an exception when min_should_match is greater than the number of extractions. - Boolean queries claim matches are verified when in fact they shouldn't. This is due to the fact that boolean queries assume that they are verified if all sub clauses are verified but things are more complex than that, eg. conjunctions that are nested in a disjunction or disjunctions that are nested in a conjunction can generally not be verified without running the query.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
While playing with the percolator I found two bugs:
extractions. While this doesn't cause direct trouble, it does when the query
is nested into a boolean query and the boolean query tries to compute the
min_should_match for the entire query based on its own min_should_match and
those of the sub queries. So I changed the code to throw an exception when
min_should_match is greater than the number of extractions.
is due to the fact that boolean queries assume that they are verified if all
sub clauses are verified but things are more complex than that, eg.
conjunctions that are nested in a disjunction or disjunctions that are nested
in a conjunction can generally not be verified without running the query.
For instance a query like
((a AND b) OR c OR d)extracts all terms and claimsit is verified, which is not true.